More Constructions, More Genres: Extending Stanford Dependencies

نویسندگان

  • Marie-Catherine de Marneffe
  • Miriam Connor
  • Natalia Silveira
  • Samuel R. Bowman
  • Timothy Dozat
  • Christopher D. Manning
چکیده

The Stanford dependency scheme aims to provide a simple and intuitive but linguistically sound way of annotating the dependencies between words in a sentence. In this paper, we address two limitations the scheme has suffered from: First, despite providing good coverage of core grammatical relations, the scheme has not offered explicit analyses of more difficult syntactic constructions; second, because the scheme was initially developed primarily on newswire data, it did not focus on constructions that are rare in newswire but very frequent in more informal texts, such as casual speech and current web texts. Here, we propose dependency analyses for several linguistically interesting constructions and extend the scheme to provide better coverage of modern web data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bertrand’s Paradox Revisited: More Lessons about that Ambiguous Word, Random

The Bertrand paradox question is: “Consider a unit-radius circle for which the length of a side of an inscribed equilateral triangle equals 3 . Determine the probability that the length of a ‘random’ chord of a unit-radius circle has length greater than 3 .” Bertrand derived three different ‘correct’ answers, the correctness depending on interpretation of the word, random. Here we employ geomet...

متن کامل

A Gold Standard Dependency Corpus for English

We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, bo...

متن کامل

Converting the parallel treebank ParTUT in Universal Stanford Dependencies

English. Assuming the increased need of language resources encoded with shared representation formats, the paper describes a project for the conversion of the multilingual parallel treebank ParTUT in the de facto standard of the Stanford Dependencies (SD) representation. More specifically, it reports the conversion process, currently implemented as a prototype, into the Universal SD format, mor...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

A Persian Treebank with Stanford Typed Dependencies

We present the Uppsala Persian Dependency Treebank (UPDT) with a syntactic annotation scheme based on Stanford Typed Dependencies. The treebank consists of 6,000 sentences and 151,671 tokens with an average sentence length of 25 words. The data is from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art, taken from the op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013